155 research outputs found

    The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences

    Get PDF
    Metagenome sequencing is becoming common and there is an increasing need for easily accessible tools for data analysis. An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments. Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community of an acid mine and of a microbial community from cow rumen

    CAMISIM: Simulating metagenomes and microbial communities

    Get PDF
    © 2019 The Author(s). Background: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Results: We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. Conclusions: CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation

    The Evolution of X-ray Clusters of Galaxies

    Get PDF
    Considerable progress has been made over the last decade in the study of the evolutionary trends of the population of galaxy clusters in the Universe. In this review we focus on observations in the X-ray band. X-ray surveys with the ROSAT satellite, supplemented by follow-up studies with ASCA and Beppo-SAX, have allowed an assessment of the evolution of the space density of clusters out to z~1, and the evolution of the physical properties of the intra-cluster medium out to z~0.5. With the advent of Chandra and Newton-XMM, and their unprecedented sensitivity and angular resolution, these studies have been extended beyond redshift unity and have revealed the complexity of the thermodynamical structure of clusters. The properties of the intra-cluster gas are significantly affected by non-gravitational processes including star formation and Active Galactic Nucleus (AGN) activity. Convincing evidence has emerged for modest evolution of both the bulk of the X-ray cluster population and their thermodynamical properties since redshift unity. Such an observational scenario is consistent with hierarchical models of structure formation in a flat low density universe with Omega_m=0.3 and sigma_8=0.7-0.8 for the normalization of the power spectrum. Basic methodologies for construction of X-ray-selected cluster samples are reviewed and implications of cluster evolution for cosmological models are discussed.Comment: 40 pages, 15 figures. Full resolution figures can be downloaded from http://www.eso.org/~prosati/ARAA

    Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG

    Get PDF
    Background: Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of metagenome projects. While metagenomics provides information on the gene content, metatranscriptomics aims at understanding gene expression patterns in microbial communities. The initial computational analysis of a metagenome or metatranscriptome addresses three questions: (1) Who is out there? (2) What are they doing? and (3) How do different datasets compare? There is a need for new computational tools to answer these questions. In 2007, the program MEGAN (MEtaGenome ANalyzer) was released, as a standalone interactive tool for analyzing the taxonomic content of a single metagenome dataset. The program has subsequently been extended to support comparative analyses of multiple datasets. Results: The focus of this paper is to report on new features of MEGAN that allow the functional analysis of multiple metagenomes (and metatranscriptomes) based on the SEED hierarchy and KEGG pathways. We have compared our results with the MG-RAST service for different datasets. Conclusions: The MEGAN program now allows the interactive analysis and comparison of the taxonomical and functional content of multiple datasets. As a stand-alone tool, MEGAN provides an alternative to web portals for scientists that have concerns about uploading their unpublished data to a website

    Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences

    Get PDF
    Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community’s functional capabilities. Here we describe PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this ‘predictive metagenomic’ approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available

    Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.</p> <p>Results</p> <p>The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (<b>RAMMCAP</b>) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".</p> <p>Conclusion</p> <p>RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

    Clustering metagenomic sequences with interpolated Markov models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.</p> <p>Results</p> <p>We present S<smcaps>CIMM</smcaps> (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. S<smcaps>CIMM</smcaps> achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of S<smcaps>CIMM</smcaps> and supervised learning method Phymm called P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> that performs better when evolutionarily close training genomes are available.</p> <p>Conclusions</p> <p>S<smcaps>CIMM</smcaps> and P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> are highly accurate methods to cluster metagenomic sequences. S<smcaps>CIMM</smcaps> operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. S<smcaps>CIMM</smcaps> and P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> are available open source from <url>http://www.cbcb.umd.edu/software/scimm</url>.</p

    Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

    Get PDF
    We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∌75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability

    WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

    Get PDF
    Gerlach W, JĂŒnemann S, Tille F, Goesmann A, Stoye J. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads. BMC Bioinformatics. 2009;10(1):430.Background Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. Results In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. Conclusions We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.d
    • 

    corecore